Interactive data exploration in a notebook with hvPlot¶

JupyterCon 2023 - Paris

1. Interactive plotting¶

In [1]:
import pandas as pd
from bokeh.sampledata.penguins import data as df

df.head()
Out[1]:
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 MALE
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 FEMALE
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 FEMALE
3 Adelie Torgersen NaN NaN NaN NaN NaN
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 FEMALE

Pandas .plot()¶

In [2]:
df.plot.scatter(x='bill_length_mm', y='bill_depth_mm')
Out[2]:
<Axes: xlabel='bill_length_mm', ylabel='bill_depth_mm'>

Enabling .hvplot¶

To enable .hvplot simply import hvplot.pandas.

In [3]:
import hvplot.pandas

df.hvplot.scatter(
    x='bill_length_mm', y='bill_depth_mm',
    by='species'
)
Out[3]:

Hover¶

In [4]:
df.hvplot.scatter(
    x='bill_length_mm', y='bill_depth_mm',
    by='species',
    hover_cols=['sex', 'island']
)
Out[4]:

Subplots¶

In [5]:
df.hvplot.scatter(
    x='bill_length_mm', y='bill_depth_mm',
    by='species',
    subplots=True, width=250
)
Out[5]:

Explore the parameter space with groupby¶

In [6]:
df.hvplot.scatter(
    x='bill_length_mm', y='bill_depth_mm',
    groupby=['species', 'sex'], width=500,
)
Out[6]:

Interact with the returned objects¶

Objects returned by calls to .hvPlot are HoloViews objects.

In [7]:
scatter_bill_depth = df.hvplot.scatter(x='bill_length_mm', y='bill_depth_mm', width=300)
print(scatter_bill_depth)
:Scatter   [bill_length_mm]   (bill_depth_mm)
In [8]:
hist_bill_length = df.hvplot.hist('bill_length_mm', width=300)

HoloViews objects can be composed.

In [9]:
scatter_bill_depth + hist_bill_length
Out[9]:
In [10]:
p1 = df.query('species == "Adelie"').hvplot.scatter(x='body_mass_g', y='bill_depth_mm', c='red', label='Adelie')
p2 = df.query('species == "Gentoo"').hvplot.scatter(x='body_mass_g', y='bill_depth_mm', c='blue', label='Gentoo')
p1 * p2
Out[10]:

Linked selection/brushing¶

In [11]:
hist_bill_depth = df.hvplot.hist('bill_depth_mm', width=300)
In [12]:
import holoviews as hv

ls = hv.link_selections.instance()
In [13]:
ls(hist_bill_length + hist_bill_depth)
Out[13]:

Large data¶

.hvplot() can handle displaying large data interactively thanks to Datashader.

In [14]:
flights = pd.read_parquet('airline_flights')
len(flights)
Out[14]:
918205
In [15]:
flights.hvplot.scatter(x='distance', y='airtime', rasterize=True)
Out[15]:

2. Interactive data pipelines¶

In [16]:
import panel as pn
import xarray as xr
In [17]:
ds = xr.tutorial.load_dataset('air_temperature')
air = ds.air
ds
Out[17]:
<xarray.Dataset>
Dimensions:  (lat: 25, time: 2920, lon: 53)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float32 241.2 242.5 243.5 ... 296.5 296.2 295.7
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...
xarray.Dataset
    • lat: 25
    • time: 2920
    • lon: 53
    • lat
      (lat)
      float32
      75.0 72.5 70.0 ... 20.0 17.5 15.0
      standard_name :
      latitude
      long_name :
      Latitude
      units :
      degrees_north
      axis :
      Y
      array([75. , 72.5, 70. , 67.5, 65. , 62.5, 60. , 57.5, 55. , 52.5, 50. , 47.5,
             45. , 42.5, 40. , 37.5, 35. , 32.5, 30. , 27.5, 25. , 22.5, 20. , 17.5,
             15. ], dtype=float32)
    • lon
      (lon)
      float32
      200.0 202.5 205.0 ... 327.5 330.0
      standard_name :
      longitude
      long_name :
      Longitude
      units :
      degrees_east
      axis :
      X
      array([200. , 202.5, 205. , 207.5, 210. , 212.5, 215. , 217.5, 220. , 222.5,
             225. , 227.5, 230. , 232.5, 235. , 237.5, 240. , 242.5, 245. , 247.5,
             250. , 252.5, 255. , 257.5, 260. , 262.5, 265. , 267.5, 270. , 272.5,
             275. , 277.5, 280. , 282.5, 285. , 287.5, 290. , 292.5, 295. , 297.5,
             300. , 302.5, 305. , 307.5, 310. , 312.5, 315. , 317.5, 320. , 322.5,
             325. , 327.5, 330. ], dtype=float32)
    • time
      (time)
      datetime64[ns]
      2013-01-01 ... 2014-12-31T18:00:00
      standard_name :
      time
      long_name :
      Time
      array(['2013-01-01T00:00:00.000000000', '2013-01-01T06:00:00.000000000',
             '2013-01-01T12:00:00.000000000', ..., '2014-12-31T06:00:00.000000000',
             '2014-12-31T12:00:00.000000000', '2014-12-31T18:00:00.000000000'],
            dtype='datetime64[ns]')
    • air
      (time, lat, lon)
      float32
      241.2 242.5 243.5 ... 296.2 295.7
      long_name :
      4xDaily Air temperature at sigma level 995
      units :
      degK
      precision :
      2
      GRIB_id :
      11
      GRIB_name :
      TMP
      var_desc :
      Air temperature
      dataset :
      NMC Reanalysis
      level_desc :
      Surface
      statistic :
      Individual Obs
      parent_stat :
      Other
      actual_range :
      [185.16 322.1 ]
      array([[[241.2    , 242.5    , 243.5    , ..., 232.79999, 235.5    ,
               238.59999],
              [243.79999, 244.5    , 244.7    , ..., 232.79999, 235.29999,
               239.29999],
              [250.     , 249.79999, 248.89   , ..., 233.2    , 236.39   ,
               241.7    ],
              ...,
              [296.6    , 296.19998, 296.4    , ..., 295.4    , 295.1    ,
               294.69998],
              [295.9    , 296.19998, 296.79   , ..., 295.9    , 295.9    ,
               295.19998],
              [296.29   , 296.79   , 297.1    , ..., 296.9    , 296.79   ,
               296.6    ]],
      
             [[242.09999, 242.7    , 243.09999, ..., 232.     , 233.59999,
               235.79999],
              [243.59999, 244.09999, 244.2    , ..., 231.     , 232.5    ,
               235.7    ],
              [253.2    , 252.89   , 252.09999, ..., 230.79999, 233.39   ,
               238.5    ],
      ...
              [293.69   , 293.88998, 295.38998, ..., 295.09   , 294.69   ,
               294.29   ],
              [296.29   , 297.19   , 297.59   , ..., 295.29   , 295.09   ,
               294.38998],
              [297.79   , 298.38998, 298.49   , ..., 295.69   , 295.49   ,
               295.19   ]],
      
             [[245.09   , 244.29   , 243.29   , ..., 241.68999, 241.48999,
               241.79   ],
              [249.89   , 249.29   , 248.39   , ..., 239.59   , 240.29   ,
               241.68999],
              [262.99   , 262.19   , 261.38998, ..., 239.89   , 242.59   ,
               246.29   ],
              ...,
              [293.79   , 293.69   , 295.09   , ..., 295.29   , 295.09   ,
               294.69   ],
              [296.09   , 296.88998, 297.19   , ..., 295.69   , 295.69   ,
               295.19   ],
              [297.69   , 298.09   , 298.09   , ..., 296.49   , 296.19   ,
               295.69   ]]], dtype=float32)
    • lat
      PandasIndex
      PandasIndex(Index([75.0, 72.5, 70.0, 67.5, 65.0, 62.5, 60.0, 57.5, 55.0, 52.5, 50.0, 47.5,
             45.0, 42.5, 40.0, 37.5, 35.0, 32.5, 30.0, 27.5, 25.0, 22.5, 20.0, 17.5,
             15.0],
            dtype='float32', name='lat'))
    • lon
      PandasIndex
      PandasIndex(Index([200.0, 202.5, 205.0, 207.5, 210.0, 212.5, 215.0, 217.5, 220.0, 222.5,
             225.0, 227.5, 230.0, 232.5, 235.0, 237.5, 240.0, 242.5, 245.0, 247.5,
             250.0, 252.5, 255.0, 257.5, 260.0, 262.5, 265.0, 267.5, 270.0, 272.5,
             275.0, 277.5, 280.0, 282.5, 285.0, 287.5, 290.0, 292.5, 295.0, 297.5,
             300.0, 302.5, 305.0, 307.5, 310.0, 312.5, 315.0, 317.5, 320.0, 322.5,
             325.0, 327.5, 330.0],
            dtype='float32', name='lon'))
    • time
      PandasIndex
      PandasIndex(DatetimeIndex(['2013-01-01 00:00:00', '2013-01-01 06:00:00',
                     '2013-01-01 12:00:00', '2013-01-01 18:00:00',
                     '2013-01-02 00:00:00', '2013-01-02 06:00:00',
                     '2013-01-02 12:00:00', '2013-01-02 18:00:00',
                     '2013-01-03 00:00:00', '2013-01-03 06:00:00',
                     ...
                     '2014-12-29 12:00:00', '2014-12-29 18:00:00',
                     '2014-12-30 00:00:00', '2014-12-30 06:00:00',
                     '2014-12-30 12:00:00', '2014-12-30 18:00:00',
                     '2014-12-31 00:00:00', '2014-12-31 06:00:00',
                     '2014-12-31 12:00:00', '2014-12-31 18:00:00'],
                    dtype='datetime64[ns]', name='time', length=2920, freq=None))
  • Conventions :
    COARDS
    title :
    4x daily NMC reanalysis (1948)
    description :
    Data is from NMC initialized reanalysis (4x/day). These are the 0.9950 sigma level values.
    platform :
    Model
    references :
    http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.html

Enabling .interactive¶

To enable .interactive on an object, you need to make the corresponding import.

In [18]:
import hvplot.xarray

Static to interactive pipeline¶

Here's an example of a very simple pipeline with 3 method calls chained. We want to replace the fixed value of 0 passed to .isel with a widget, and see how the output changes.

In [19]:
# static pipeline
air.isel(time=0).mean()
Out[19]:
<xarray.DataArray 'air' ()>
array(274.16626, dtype=float32)
Coordinates:
    time     datetime64[ns] 2013-01-01
xarray.DataArray
'air'
  • 274.2
    array(274.16626, dtype=float32)
    • time
      ()
      datetime64[ns]
      2013-01-01
      standard_name :
      time
      long_name :
      Time
      array('2013-01-01T00:00:00.000000000', dtype='datetime64[ns]')

    We create a Panel IntSlider widget, that will produce values suitable to be passed to .isel.

    In [20]:
    slider = pn.widgets.IntSlider(name='time', start=0, end=10)
    slider
    
    Out[20]:

    We create the interactive pipeline by calling .interactive() on the data structure. The object it returns has the same API.

    In [22]:
    pipeline = air.interactive()
    print(pipeline)
    
    <hvplot.xarray.XArrayInteractive object at 0x2ba068b80>
    

    We create the interactive pipeline, replacing 0 with the widget we've just created.

    In [23]:
    # interactive pipeline
    pipeline.isel(time=slider).mean()
    
    Out[23]:

    The output of the interactive pipeline includes the set of widgets that drive it, together with the pipeline output that is updated whenever a widget value changes.

    Plotting¶

    While of course .hvPlot is well supported, an interactive pipeline can also output a Matplotlib plot generated from Xarray's .plot.

    In [24]:
    time = pn.widgets.Player(name='time', start=0, end=10, loop_policy='loop', interval=500)
    
    air.interactive(loc='bottom', align='center').isel(time=time).plot()
    
    Out[24]:
    In [25]:
    air.interactive(loc='bottom', align='center').isel(time=time).hvplot()
    
    Out[25]:

    More complex pipeline¶

    In [26]:
    temp_unit = pn.widgets.Select(name='Temperature unit', options={'K': 0, 'C': 273.15})
    
    pipeline = (
      air.interactive()
     .isel(time=pn.widgets.IntRangeSlider)   # Infer the values automatically
     .to_dataframe()                         # Compute a Pandas DataFrame
     .groupby('time')
     .mean()
    )
    pipeline = pipeline - temp_unit          # Support for operators
    pipeline.hvplot.line('time', 'air', width=500)
    
    Out[26]:

    Functions as input¶

    In [28]:
    from datetime import datetime
    import yfinance
    
    # Widget for the function as input
    w_ticker = pn.widgets.Select(name='Ticker', options=['NVDA', 'AAPL', 'IBM', 'GOOG', 'MSFT'])
    
    # Define a loading function that returns a Pandas DataFrame
    def load_from_yahoo(ticker: str) -> pd.DataFrame:
        return yfinance.download(
            ticker, start=datetime(2020, 1, 3), end=datetime(2022, 4, 29), progress=False
        )
    
    # Bind the function to a widget and make the bound function interactive.
    pipeline = hvplot.bind(load_from_yahoo, w_ticker).interactive(loc='left')
    
    # Widgets for the regular pipeline
    w_resample = pn.widgets.RadioButtonGroup(options=['W', 'M'])
    w_dt_range = pn.widgets.DateRangeSlider(start=datetime(2020, 1, 3), end=datetime(2022, 4, 29))
    
    (
        pipeline
        .loc[(pipeline.index>=w_dt_range.param.value_start) & (pipeline.index<=w_dt_range.param.value_end)]
        .resample(w_resample).agg({'Open': 'first', 'High': 'max', 'Low': 'min', 'Close': 'last'})
        .hvplot.ohlc(grid=True, title=w_ticker, width=500)
    )
    
    Out[28]:

    Wrap it in a Panel template¶

    In [29]:
    diff = air.interactive.sel(time=pn.widgets.DiscreteSlider) - air.mean('time')
    kind = pn.widgets.Select(options=['contourf', 'contour', 'image'], value='image')
    interactive = diff.hvplot(cmap='RdBu_r', clim=(-20, 20), kind=kind)
    
    template = pn.template.BootstrapTemplate(
        title='Interactive pipeline',
        sidebar=["## Select a time and type of plot", *interactive.widgets()],
        main=[interactive.panel()],
    )
    
    
    template.show();
    
    Launching server at http://localhost:55483
    
    WARNING:bokeh.core.validation.check:W-1005 (FIXED_SIZING_MODE): 'fixed' sizing mode requires width and height to be set: Row(id='p5712', ...)
    

    3. Low-code data explorer¶

    In [30]:
    hvexplorer = hvplot.explorer(df)
    hvexplorer
    
    Out[30]:

    The explorer can be used to explore data and edit plots. Once you are satisfied with a plot, you can save its settings with settings() or get a string with plot_code() that you can copy/paste and execute to reproduce the plot.

    In [31]:
    plot_settings = hvexplorer.settings()
    plot_settings
    
    Out[31]:
    {'by': ['species'],
     'kind': 'scatter',
     'title': 'JupyterCon23',
     'x': 'bill_length_mm',
     'y': ['bill_depth_mm']}
    In [32]:
    df.hvplot(**plot_settings)
    
    Out[32]:
    In [33]:
    hvexplorer.plot_code()
    
    Out[33]:
    "df.hvplot(by=['species'], kind='scatter', title='JupyterCon23', x='bill_length_mm', y=['bill_depth_mm'])"
    In [35]:
    df.hvplot(by=['species'], kind='scatter', title='JupyterCon23', x='bill_length_mm', y=['bill_depth_mm'])
    
    Out[35]:
    In [ ]: